Towards Maximizing the Area Under the ROC Curve for Multi-Class Classification Problems
نویسندگان
چکیده
The Area Under the ROC Curve (AUC) metric has achieved a big success in binary classification problems since they measure the performance of classifiers without making any specific assumptions about the class distribution and misclassification costs. This is desirable because the class distribution and misclassification costs may be unknown during training process or even change in environment. MAUC, the extension of AUC to multi-class problems, has also attracted a lot of attention. However, despite the emergence of approaches for training classifiers with large AUC, little has been done for MAUC. This paper analyzes MAUC in-depth, and reveals that the maximization of MAUC can be achieved by decomposing the multi-class problem into a number of independent sub-problems. These sub-problems are formulated in the form of a “learning to rank” problem, for which well-established methods already exist. Based on the analysis, a method that employs RankBoost algorithm as the sub-problem solver is proposed to achieve classification systems with maximum MAUC. Empirical studies have shown the advantages of the proposed method over other eight relevant methods. Due to the importance of MAUC to multi-class cost-sensitive learning and class imbalanced learning problems, the proposed method is a general technique for both problems. It can also be generalized to accommodate other learning algorithms as the subproblem solvers.
منابع مشابه
Technical Report No: BU-CE-1001 A Discretization Method based on Maximizing the Area Under ROC Curve
We present a new discretization method based on Area under ROC Curve (AUC) measure. Maximum Area under ROC Curve Based Discretization (MAD) is a global, static and supervised discretization method. It discretizes a continuous feature in a way that the AUC based only on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as Entropy-MDLP (...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملReceiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation
This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies...
متن کاملA simplified extension of the Area under the ROC to the multiclass domain
The Receiver Operator Characteristic (ROC) plot allows a classifier to be evaluated and optimised over all possible operating points. The Area Under the ROC (AUC) has become a standard performance evaluation criterion in two-class pattern recognition problems, used to compare different classification algorithms independently of operating points, priors, and costs. Extending the AUC to the multi...
متن کاملAn Empirical Study of MAUC in Multi-class Problems with Uncertain Cost Matrices
Cost-sensitive learning relies on the availability of a known and fixed cost matrix. However, in some scenarios, the cost matrix is uncertain during training, and re-train a classifier after the cost matrix is specified would not be an option. For binary classification, this issue can be successfully addressed by methods maximizing the Area Under the ROC Curve (AUC) metric. Since the AUC can me...
متن کامل